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Abstract: In the past decade, the pharmaceutical industry and biomedical research sector have 
devoted considerable resources to pharmacogenomics (PGx) with the hope that understanding 
genetic variation in patients would deliver on the promise of personalized medicine. With the 
advent of new technologies and the improved collection of DNA samples, the roadblock to 
advancements in PGx discovery is no longer the lack of high-density genetic information cap- 
tured on patient populations, but rather the development, adaptation, and tailoring of analytical 
strategies to effectively harness this wealth of information. The current analytical paradigm in 
PGx considers the single-nucleotide polymorphism (SNP) as the genomic feature of interest and 
performs single SNP association tests to discover PGx effects - ie, genetic effects impacting 
drug response. While it can be straightforward to process single SNP results and to consider 
how this information may be extended for use in downstream patient stratification, the rate 
of replication for single SNP associations has been low and the desired success of producing 
clinically and commercially viable biomarkers has not been realized. This may be due to the 
fact that single SNP association testing is suboptimal given the complexities of PGx discovery 
in the clinical trial setting, including: 1 ) relatively small sample sizes; 2) diverse clinical cohorts 
within and across trials due to genetic ancestry (potentially impacting the ability to replicate 
findings); and 3) the potential polygenic nature of a drug response. Subsequently, a shift in 
the current paradigm is proposed: to consider the gene as the genomic feature of interest in 
PGx discovery. The proof-of-concept study presented in this manuscript demonstrates that 
genomic region-based association testing has the potential to improve the power of detect- 
ing single SNP or complex PGx effects in the discovery stage (by leveraging the underlying 
genetic architecture and reducing the multiplicity burden), and it can also improve power in 
the replication stage. 

Keywords: variance components, pharmacogenomics strategy, pharmacogenomics replication, 
pharmacogenomics discovery, personalized medicine 

Introduction 

In the past decade, the pharmaceutical industry and biomedical research sector have 
devoted considerable resources to pharmacogenomics (PGx) with the hope that under- 
standing genetic variation in patients would deliver on the promise of personalized 
medicine. 1 While technological breakthroughs have been realized in high-density 
single-nucleotide polymorphism (SNP) genotyping and DNA sequencing, with 
similar advancements made in the understanding of disease etiology, the discover- 
ies resulting from the investigations of genetic variation and drug response have 
been limited. The roadblock to progress in PGx discovery is no longer in obtaining 
high-dimensional genetic data on patient populations, but rather in how to effectively 
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translate the wealth of information available into value for 
clinical development programs. This subsequently requires 
advancements to be made in the analytics that inform PGx 
strategy. 

In the context of patient stratification/selection being the 
ultimate goal, development of a clinically and commercially 
viable biomarker based on DNA sequence variation is a 
complex process that can be generally framed as a two-stage 
process: 1) identification of relevant genomic features (for 
example, genes, exons, and SNPs); and 2) translation of 
PGx discovery results into a patient subgroup. In this sense, 
downstream success in PGx-driven patient stratification/ 
selection strategies ultimately hinges on the ability to 
identify the relevant genetic factors in the initial stage of PGx 
discovery. The focus of this manuscript is on the first stage, 
which also has applications in understanding the mode of 
action and drug target identification. It should be noted that 
the second stage is the focus of the field termed "subgroup 
identification" (for example, Li et al 2 and Lipkovich et aP), 
and is the critical next step in translating findings from the 
discovery stage into clinical and commercial value in the form 
of a diagnostic tool (for example, a laboratory-developed test, 
a clinical laboratory test, or companion diagnostic). 

Currently, most drugs enter Phase II/III clinical 
development with a hypothesis-generating PGx program 
due to a lack of prior empirical evidence on the drug-gene 
relationship with clinical response. This means that any ana- 
lytical strategy for PGx discovery must be considerate to the 
unique stressors of translational research in the clinical trial 
setting: 1) the relatively small sample sizes; 2) the diverse 
clinical cohorts within and across trials due to genetic ances- 
try (potentially impacting the ability to replicate findings); 

3) the potential polygenic nature of a drug response; and 

4) the business implications of costs associated with both 
false positives and false negatives. 4 

Consider a hypothetical PGx study conducted on a 
placebo-controlled clinical trial with DNA samples assayed 
on an SNP genotyping platform in which the objective is to 
identify the genes/variants associated with treatment-specific 
efficacy. The common analytical approach used in this case 
is single SNP association testing (SSAT). Unfortunately, 
this analytical strategy has not delivered the desired value 
in terms of producing germline DNA-based classifiers with 
clinical utility for drug efficacy. 1 Additionally, the rate of 
replication for single SNP associations is quite low (poten- 
tially due to the impact of linkage disequilibrium [LD] on 
association testing and the differences in allele frequency 
between populations). 5 



Given this, it is necessary to consider a shift in the cur- 
rent paradigm for discovery in PGx - specifically, should an 
individual SNP remain the primary genomic feature of inter- 
est at the discovery stage, or should an alternative definition 
based on a set of SNPs be considered toward improving the 
chances of success? In the context of PGx discovery without 
prior information, we recommend defining the genomic fea- 
ture of interest as a gene, since this represents a biologically 
relevant unit of genetic variation with structural annotation 
that is independent of ancestry and is generalizable across 
studies. 

While the concept of gene-based testing has been the 
subject of recent research in genome -wide association studies 
of human disease, 5 9 the PGx space has been slow to adopt 
this approach potentially due to the nontrivial translation of 
gene -based discoveries to a patient stratification/selection 
strategy. For a single SNP effect, moving from PGx discovery 
to patient stratification/selection is relatively straightfor- 
ward, as patients can be assigned to a subgroup based on 
their genotype call at a single locus. For a gene-level effect 
(ormulti-SNP effect), moving from PGx discovery to patient 
stratification/selection requires refinement steps toward iden- 
tifying the critical SNPs and functional form necessary to 
define the patient subgroup. However, it is important to note 
that while the transition to patient stratification/selection is 
easier for single SNP effects, there are inherent limitations 
in terms of subgroup size (due to allele frequency) and it is 
possible that a single SNP may not be sufficient in defining 
a classifier with clinical and commercial utility. 

To realize the concept of gene-based testing, a variety of 
statistical methodologies have been developed to effectively 
harness the information captured by a set of SNPs, referred 
to herein more generally as region-based association test- 
ing (RBAT), as these methods are not restricted to only 
considering a gene, and it may be of interest to consider 
an alternative unit of genomic variation depending on the 
objectives of a given PGx study. 69 In this work, a comparison 
is made across SSAT, RBAT that jointly tests the effect of a 
set of SNPs in a gene, and RBAT that considers identifying 
a gene as significant based on the minimum adjusted single 
SNP P-value. 

In general, we demonstrate that a genomic region-based 
testing strategy can: 1) be more powerful (by leveraging the 
underlying genetic architecture and reducing the multiplicity 
burden); 2) lend to the detection of complex genetic effects; 
and 3) improve the likelihood of replication. Here, we pro- 
vide proof-of-concept (POC) evaluations in support of a 
genomic region-based testing strategy for PGx discovery and 
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replication, and we offer an analytical framework conducive 
to tailoring and PGx study design. 

Materials and methods 

In most PGx studies, the number of genetic markers to be 
investigated in the PGx discovery stage (even in a candidate 
gene study) is larger than the available number (n) of samples 
(ie, number of genetic markers >n), and thus subgroup iden- 
tification approaches (such as those by Li et al) 2 are often not 
directly applicable, therefore it is often necessary to narrow a 
larger set of genomic regions to a focused set of genomic regions 
before estimating a genetic signature for the purpose of patient 
stratification. The evaluation of different analytical strategies to 
approach this prescreening (or discovery) step is the focus of 
this manuscript, where emphasis is given to association testing 
as the statistical method of choice for the initial screening. 

Herein, the focus will be on common SNPs having a 
minor allele frequency (MAF) >5%. However, similar strat- 
egies as described in this manuscript can be employed for 
low frequency (ie, 1%< MAF <5%) or rare (MAF <1%) 
variants, although some methods may not be applicable or 
may require adaptation in this context. 

Association testing framework 

Assume that genotype data from patients in a placebo- 
controlled, two-arm clinical trial will be collected, and that 
it is of interest to identify genetic markers with treatment- 
specific effects. Testing for an association between a single, 
or a set of, genetic marker(s) and a univariate continuous 
outcome of interest in the context of PGx studies can be 
framed in a standard linear regression framework: 

y= X T a+ T/3 T + f G (G)+ f GxT (G)+ £, (1) 

where j is the phenotype vector, a is a vector of coefficients 
(fixed effects) for covariates contained in X, P T is the treat- 
ment effect, T a vector of the treatment indicators for each 
patient (1= treated, 0= placebo), f G (G) is a function of the 
genotype matrix with G representing a potential genetic main 
effect (ie, a prognostic effect), and s~N(0, &l) is the error 
term. The effect of interest when aiming to identify treatment- 
specific effects is the interaction term, f Gx ^G). The functional 
form of / determines what type of association testing is 
performed, as outlined in the following sections. 

Single SNP association testing (SSAT) 

SSAT is the most commonly used approach in PGx studies. 
The functional form/for SSAT reduces to, 



f G (G) = G^ SNPj and f GxT (G) = TG^ SNPj ^ (2) 

where Gj is the genotype matrix containing genotypes of 
SNP j, coded using a genotypic model, for patients i=l,...n. 
The test of interest is H 0 : j3 SNP lsT =0 , versus H a : /? SNP . xT ^0, 
where significance is determined via a g - 1 degree of 
freedom likelihood ratio test where g is the number of 
observed genotypes for SNP j. It is important to note that 
without the loss of generality, it is trivial to consider alter- 
native codings/genetic models, such as additive, dominant, 
or recessive. 

Region-based association testing (RBAT) 

As outlined in the introduction, SSAT has been used 
since the inception of PGx studies despite suboptimal 
performance in the context of translational research in 
the clinical trial setting (for example, in the case of rela- 
tively small sample sizes, diverse cohorts, and potential 
polygenic effects of drug responses). 4 Alternatively, RBAT 
approaches have the potential to improve power (by lever- 
aging the underlying LD structure in genetic data and 
reducing the multiplicity burden) and to detect complex 
PGx effects. While many RBAT approaches exist, this POC 
study will focus on contrasting the performance of SSAT 
with two exemplary RBAT approaches that are often used 
in disease genetics to demonstrate how these two types of 
association testing strategies can provide complementary 
information in the context of PGx studies. 

Single SNP region-based association testing (SS-RBAT) 

A straightforward approach to test the combined effect 
of a set of p SNPs that has been used in genetic analyses 
is to combine p P-values from SSAT using methods such 
as Fisher's method to combine P-values or by using the 
minimum P- value approach. 1011 This manuscript will con- 
sider the latter (ie, the significance of a set of p SNPs in a 
genomic region), evaluated via, 

Pr, min (Pi, adjusted, • • • ' Pp.adjusted ) ~~ mm vPj, adjusted )j (^) 

where p .. , , is the P-value of SNP / based on SSAT after 

r ^adjusted J 

adjusting for multiplicity (see the section titled, "Additional 
considerations for performance metrics" for details regarding 
multiplicity adjustment). 

For the purposes of this POC study, the 
approach described in this section is sufficient and 
computationally efficient to demonstrate general 
performance trends. 
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Variance components region-based 
association testing (VC-RBAT) 

A family of RBAT approaches that has gained popularity 
in recent years is known as kernel machine regression, and 
includes tests such as the sequence kernel association test. 9 
These approaches can be framed in the context of a linear 
mixed model and they ultimately test whether a significant 
proportion of phenotypic variability can be explained by 
genetic variation within a region of interest. Specifically, the 
functional form / can be written as, 

f G {G)=j3 R andf GxT {G)=j3 RxT , (4) 

where fi R ~ 7V(0,^(G)) and fi^ ~ NiQ^K^G)) 
are assumed to be random effects following a multivariate 
normal distribution, with variance-covariance matrices K X (G) 
representing the main effect due to genetic variability and 
K 2 (G) = T®K 1 (G) used for the region-by-treatment interaction - 
ie, the effect of interest - where ® denotes the element-wise 
(Hadamard) matrix product. K l and /l 2 are also called kernels, 
and they measure the genetic similarity between individuals; 
several choices for these kernels are available. Since this POC 
study focuses on common SNPs, an unweighted Identity-by- 
State kernel will be utilized for all evaluations. 7 

The test of interest for this specific RBAT approach then 
simplifies to the variance component test, H 0 : a 2 ^ = 0 versus 
H a \ a\ xT > 0, and thus this approach will be referred to herein 
as variance components RBAT (VC-RBAT). Significance 
will be determined using a linear score test introduced by 
Qu et al 8 that allows for testing the significance of a variance 
component in the presence of a nuisance variance compo- 
nent (ie, (T R ) and has desirable properties when sample sizes 
are small. 

For additional details around VC-RBAT approaches, 
please refer to Qu et al 8 and Wu et al. 9 

Simulation study to evaluate analytical 
strategies to identify subgroup-defining 
genetic markers 

Phenotypic and genotypic data were simulated to mimic 
clinical trial conditions and realistic human genetic variation. 
Assume that a candidate gene study evaluating common 
SNPs across 25 genes is being conducted as part of a Phase II 
(or Phase III) placebo-controlled clinical trial with 400 treated 
and 400 placebo patients. For the purpose of this POC study, 
all patients are Caucasians of European ancestry Details 
around the simulation of phenotypic and genotypic data are 
given in the following sections. 



Simulation of phenotypic data 

The simulation study outlined in this section aims to evaluate 
SSAT, SS-RBAT, and VC-RBAT in a focused set of realistic 
scenarios. Since the end goal of many PGx studies is to 
identify a subgroup of patients with enhanced treatment 
effects (ie, patient stratification), continuous outcomes were 
simulated (assuming that a subgroup solely defined by genetic 
markers exists) using the following model: 

y. =a 0 + T tJ 8 T + SJS S + T,S^ SxT + e, , (5) 

where S. is an indicator variable for subgroup membership, 
P s the subgroup main effect (ie, a prognostic effect), p SxT is 
the enhanced treatment effect of patients belonging to the 
subgroup, and e i ~ N(0, a 2 =l) the error term. 

Assume that a weak treatment effect with a small effect 
size is observed in the overall population: 

AT = (E(Y |ze T)- E(Y \ i £ T))la 

= (E(Y\ie T)-E(Y\i£ T))/l=0.3, (6) 

where A r is the scaled treatment effect (ie, the difference in 
means between the treated and placebo patients) scaled by 
the standard deviation. Furthermore, assume that the treat- 
ment effect can be partitioned into a genetic and a nongenetic 
component via the subgroup-by-treatment interaction: 

A T =0.3=0.3C + (1-C)*0.3 

= \s\*^+^ = \s\*ft StT +A, 

where C is the proportion of the treatment effect explained 
by genetics (ie, subgroup membership) and |5| is the sub- 
group size. Thus, fi SxT =Q3C*llf and j8 T = (1-C)*0.3 
with C e {0,0.1,0.2, ...,1} was used to simulate outcomes. 
Note that for the purpose of this POC study, no subgroup 
main effect was assumed (ie, jS s =0 was used for all 
simulations). 

Simulation of genotypic data and assignment 
of patients to genetically defined subgroups 

Subject-level genotype data for 254 common variants in an 
exemplary 63.3 kb region of the ABCA1 gene (chromosome 9: 
107627259-107690527; relative to human genome GRCh38 
reference assembly and Human Annotation Release 104; see 
Figure S 1 ) was simulated using haplotypes from individuals 
of European ancestry, as downloaded from the 1000 Genomes 
Project. 12 
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To allow for comparison across different scenarios, a sub- 
group size \S\ of 30% was targeted across various subgroup 
definitions. Subgroup membership for each patient was 
determined using the following three POC scenarios (note 
that AA denotes the major homozygote): 

1. Enhanced treatment effect is driven by a single SNP 
located in an LD block: 

1 1 if G Uni 1790326 = AB, BB 
' [0 ifG Urslim326 = AA 

2. Enhanced treatment effect is driven by a single SNP not in 
LD with other SNPs (ie, maximum pairwise r of 0.2): 

S _ j 1 $ G Ul»894164 = AB,BB 
\0 if G,„ 199894! 64 = AA 

3 . Enhanced treatment effect is driven by two SNPs located 
in the same gene (but in different LD blocks): 

s = | 0 if G l rs4m654 = AA or if G Unl234Tm = AA 
[ 1 otherwise 

Note that the MAFs of SNPs rsl 1790326, rs 199894 164, 
rs4100654, and rsl2347784 are 0.19, 0.19, 0.08, and 0.09, 
respectively, generating subgroup sizes of approximately 
30% for all three scenarios. 

Additional considerations and performance metrics 

For each of the 30 scenarios outlined in the sections titled, 
"Simulation of phenotypic data" and "Simulation of geno- 
typic data and assignment of patients to genetically denned 
subgroups", 1,000 datasets were simulated and randomly 
combined to generate 500 pairs of trials representing a dis- 
covery trial and a replication trial. 

For each trial, P-values for SSAT, SS-RBAT, and VC- 
RBAT were recorded for all variants in the ABCA1 region 
or the entire ABCA 1 region, respectively. For computational 
efficiency, P- values for the remaining 24 genes and 3,546 
independent variants across these genes were drawn from 
a uniform distribution, since only variants in gene ABCA1 
provided a genetic contribution to the treatment effect. 

For the discovery trial, multiplicity adjustment was per- 
formed using a Bonferroni correction for 25 tests for VC- 
RBAT and 3,800 effective tests for SSAT (after consideration 
to dependency among SNPs). Significance was determined 
using an alpha-level a x of 0.05 or 0.20 (as an alternative, 
relaxed threshold in the case where more risk is acceptable at 
the discovery stage). For the replication trial, the multiplicity 
adjustment was performed using a Bonferroni correction, 
adjusting for the number of genes/SNPs that were signifi- 
cant at alpha-level a x in the discovery trial for VC-RBAT or 
SSAT, respectively. Significance in the replication trial was 
determined using an alpha-level a 2 of 0.05. 



For each scenario, the following performance metrics 
were estimated: 

1. Power of VC-RBAT and SS-RBAT to detect ABCA1 in 
the discovery trial using alpha-level a,; 

2. Power of SSAT to detect the subgroup-defining SNPs in 
the discovery trial using alpha-level a,; 

3. Power of VC-RBAT and SS-RBAT to discover and 
replicate ABCA 1 using alpha-levels a { and a 2 , and 

4. Power of SSAT to discover and replicate the subgroup- 
defining SNPs using alpha-levels a x and a 2 . 

Note that power to discover and replicate in metrics 3 and 
4 was calculated as the proportion among the 500 dataset 
pairs where the unit of interest (ie, the SNP or gene) was 
significant after multiplicity adjustment in the discovery trial 
using alpha-level a v and significant after multiplicity adjust- 
ment in the replication trial using alpha-level a 2 . 

Additionally, type 1 error was estimated for scenarios 
where C=0 (ie, no genetic contribution to the treatment 
effect exists). 

Results 

This section summarizes the results (see Table 1; 
Figures 1 and 2) from the simulation study outlined in the 
section titled, "Simulation study to evaluate analytical strat- 
egies to identify subgroup-defining genetic markers". The 
following general observations were made for the scenarios 
considered in this POC study: 

PGx discovery stage: 

• In the context of no PGx effect, type 1 error is preserved 
(ie, the probability of a false positive is controlled at the 
desired level); 

• If the PGx subgroup-defining SNP is located in a region 
of high LD, RBAT approaches considered here gener- 
ally outperform SSAT, where the multimarker approach 
(ie, VC-RBAT) had improved power over the single SNP 
region-based approach (ie, SS-RBAT); 

• If the PGx subgroup-defining SNP is not in LD with 
any other SNP in the region then, as expected, SSAT 
and SS-RBAT have comparable power and VC-RBAT 



Table 1 Type 1 error in the 


PGx discovery stage 




Approach 


a=0.05 


a=0.2 


VC-RBAT 


0.044 


0.184 


SS-RBAT 


0.046 


0.192 


SSAT 


0.046 


0.192 



Abbreviations: PGx, pharmacogenetics; VC-RBAT, variance components region- 
based association testing; SS-RBAT, single single-nucleotide polymorphism region- 
based association testing; SSAT, single single-nucleotide polymorphism association 
testing. 
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Genetic contribution to treatment effect 

Scenario 2, a =0.05 



VC-RBAT: power to detect ABCA 1 
SS-RBAT: power to detect ABCA 1 
SSAT: power to detect rs1 998941 64 




10% 20% 30% 40% 50% 60% 70% 80% 90% 

Genetic contribution to treatment effect 

Scenario 3, a, =0.05 
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0.4 
0.2 



VC-RBAT: power to detect ABCA 1 
SS-RBAT: power to detect ABCA 1 
SSAT: power to detect rs4100654 
SSAT: power to detect rs1 2347784 




VC-RBAT: power to detect ABCA 1 
SS-RBAT: power to detect ABCA 1 
SSAT: power to detect rs1 1790326 
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Genetic contribution to treatment effect 

Scenario 2, a =0.2 



VC-RBAT: power to detect ABCA1 
SS-RBAT: power to detect ABCA 1 
SSAT: power to detect rs199894164 



20% 30% 40% 50% 60% 70% 80% 90% 

Genetic contribution to treatment effect 

Scenario 3, a =0.2 



VC-RBAT: power to detect ABCA1 
SS-RBAT: power to detect ABCA 1 
SSAT: power to detect rs4100654 
SSAT: power to detect rs12347784 




10% 20% 30% 40% 50% 60% 70% 80% 90% 100%, 

Genetic contribution to treatment effect 



10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 

Genetic contribution to treatment effect 



Figure I Power estimates for the PGx discovery stage for all scenarios. 

Notes: Power estimates of VC-RBAT and SS-RBAT to detect ABCAI and the power of SSAT to detect the subgroup-defining SNPs in the discovery trial using alpha-level 
are shown for all scenarios considered in this POC study. Scenario I (A and B): Enhanced treatment effect is driven by a single SNP located in an LD block. Scenario 2 
(C and D): Enhanced treatment effect is driven by a single SNP not in LD with other SNPs. Scenario 3 (E and F): Enhanced treatment effect is driven by two SNPs located 
in the same gene (but in different LD blocks). 

Abbreviations: VC-RBAT, variance components region-based association testing; SS-RBAT, single single-nucleotide polymorphism region-based association testing; 
SSAT, single single-nucleotide association testing; PGx, pharmacogenomics; SNPs, single-nucleotide polymorphisms; POC, proof-of-concept; LD, linkage disequilibrium. 



has negligible power due to the inability to leverage 
information captured by other SNPs in LD with the PGx 
subgroup-defining SNP; and 

• In the context of a multi-SNP effect, both RBAT 
approaches have higher power than SSAT. 

PGx replication stage: 

• RBAT approaches demonstrate the potential to improve 
the power to replicate PGx findings across the scenarios 
evaluated in the POC; and 

• Relaxing the alpha threshold (ie, taking more risk) in 
the discovery stage improves the power to replicate PGx 
findings across the scenarios evaluated in the POC. 

In summary, the results of the POC study presented here 
demonstrate that the performance of the selected statistical 
frameworks is dependent on both the true underlying PGx 
effect, as well as on the genomic architecture in the region 
where the PGx subgroup-defining SNP is located. Although 
in many scenarios power is limited across all approaches, this 
demonstrates the value of considering alternatives to SSAT, as 



these region-based approaches may provide complementary 
information not obtained by SSAT. 

Discussion 

To appreciate the need for advancement in PGx, one can 
start by taking inventory of the success in this space since 
the approval of Herceptin® (Genentech, Inc., South San 
Francisco, CA, USA) (the first drug with PGx/biomarker 
information on its label). A review of the United States 
Food and Drug Administration's (FDA's) table of pharmaco- 
genomic biomarkers in drug labeling 1 revealed that only 12% 
of drugs since Herceptin had PGx/biomarker information 
in their label and only 14 of these labels direct clinicians to 
utilize testing prior to prescription. Clearly, there is room for 
improvement in PGx-driven patient stratification/selection 
for therapeutic development and intervention; however, the 
current paradigm and PGx analytics are failing to produce 
biomarkers with clinical and commercial utility. 1 

While technological advancements tend to follow Moore's 
law, 13 similar advancements are not necessarily realized in the 
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■ VC-RBAT: power to detect ABCA1 
m SS-RBAT: power to detect ABCA1 

■ SSAT: power to detect subgroup defining SNP 1 

□ SSAT: power to detect subgroup defining SNP 2 0.4 
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a,=0.2, a, =0.05 

■ VC-RBAT: power to detect ABCA1 
M SS-RBAT: power to detect ABCA 1 

■ SSAT: power to detect subgroup defining SNP 1 
□ SSAT: power to detect subgroup defining SNP 2 



1 



Scenario 1 Scenario 2 Scenario 3 



Figure 2 Power estimates for the replication of PGx effects for selected scenarios. 

Notes: Power estimates to discover and replicate (ie, performance metrics 3 and 4) for a targeted scenario where the genetic contribution to the treatment effect is 60%. 
Power was calculated as the proportion among the 500 dataset pairs where the unit of interest (ie, the SNP or gene) was significant after multiplicity adjustment in the 
discovery trial using alpha-level a I (ie, 0.05 in A and 0.2 in B) and significant after multiplicity adjustment in the replication trial using alpha-level a2 (ie, 0.05 in A and B). 
Similar observations were made across the entire range of genetic contributions. 

Abbreviations: VC-RBAT, variance components region-based association testing; SS-RBAT, single single-nucleotide polymorphism region-based association testing; 
SSAT, single single-nucleotide association testing; SNP, single-nucleotide polymorphism; PGx, pharmacogenomics. 



space of analytics. Specifically, the analytical "breakthroughs" 
in the space of personalized medicine have not translated to 
the desired impact at the patient level, especially in the area of 
drug effectiveness, as noted by the review of the FDA's table. 
To our knowledge, aside from variations in the cytochrome 
p450 enzymes, no genes have been identified that harbor 
germline DNA variation, which impacts drug efficacy in a 
clinically relevant manner. At this point, incorporating better 
analytical strategies in the early stages of PGx discovery is 
necessary to deliver empirical evidence toward transforming 
the personalized medicine landscape. 

It is subsequently proposed that consideration be given 
to a paradigm shift in what is generally the most commonly 
applied analytical strategy in the PGx discovery stage - 
specifically, shifting away from starting at the smallest 
unit of genetic variation (ie, a single variant) to starting at 
a larger unit of genetic variation (ie, aggregating informa- 
tion across genomic regions, such as from a gene). For the 
purposes of this manuscript, the genomic region was defined 
as the gene due to its consistent definition across clinical 
trial populations, ethnicities, and so on; 5 however, there 
is nothing that precludes other biologically relevant units 
of genomic variation, such as an exon or a pathway, from 
being considered. 

While various frameworks/methodologies exist for 
evaluating the impact of a genetic variation within a region, 
two exemplary RBAT approaches were implemented in this 
POC simulation study. The results outlined in the Results 



section demonstrate that RBAT approaches, independent of 
the statistical framework chosen (ie, SS-RBAT or VC-RBAT), 
tend to provide the following: 

1 . Improved power to detect either single SNP or complex 
PGx effects (dependent on the LD structure in the region 
where the PGx subgroup-defining SNP resides); and 

2. Improved power to replicate PGx findings (ie, the ability 
to discover a PGx effect and then replicate it in a subse- 
quent trial). 

Understanding that genomic variation within a specific 
region impacts drug response is the critical first step toward 
developing a patient stratification/selection strategy. The pre- 
vailing thought in this manuscript is that it is first necessary to 
identify the correct genomic region and then to subsequently 
refine these findings toward understanding which specific 
variants are driving the response. To this point, while the 
topic of developing a companion diagnostic (to determine 
who to treat) or a laboratory-developed test (to help inform 
physician practice patterns) is often discussed 14 it is first 
necessary to implement a rigorous analytical strategy capable 
of both discovering the important genes/variants and identify- 
ing patients based on this set of important genes/variants (ie, 
subgroup identification). Future analytical work is needed 
to refine and effectively utilize gene-level findings, with 
an emphasis in the field of subgroup identification, as this 
will be integral in translating PGx findings to value - that 
is, delivering tailored PGx-guided therapeutic development 
and intervention strategies. 
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Perhaps the most significant contribution of this manu- 
script is the provision of a framework to inform PGx strategy 
development via tailored simulations. For clinical develop- 
ment programs to realize the added value of PGx, tailored 
analytical strategies should be incorporated in the early stages 
of discovery to increase the chances of success at achiev- 
ing your ultimate goal (whether that be the development 
of a diagnostic for patient stratification, understanding the 
mechanism of action for a drug, or identifying new drug 
targets). Emphasizing analytics will be paramount in real- 
izing the potential of PGx and personalized medicine for 
the pharmaceutical and biotechnology industries, providers, 
payers, and patients. 
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Figure S I Linkage disequilibrium plot of ABCA I region used for simulation studies. 
Abbreviation: MAF, minor allele frequency. 
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